With many potential predictors we can eliminate some by examining correlation plots
There is weak correlation between SOC_stock_spline and
the selected covariates. Additionally there is some collinearity between
predictors. The spectral predictors such as NDYI (Normalized Difference
Yellow Index) and NDVI (Normalized Difference Vegetation Index). We can
remove these based on the correlation coefficient > 0.7.
Now we can begin examining the distribution of carbon stock values in the dataset to choose the appropriate transformation. We also scale and center the numeric predictor variables.
Now build models using log transformed carbon stock data. We need to
specify that sample_ID is a random effect because of the
multiple samples at one location. lower_depth will be a
random slope to adjust model based on how it is affected by depth.
Also lower depth is included as a fixed effect to specify an overall
effect of depth in addition to accounting for the random slope within
sample_ID locations. See:
We also use a different optimizer for the lmer model which uses
nloptwrap by default and gives convergence warnings for
multiple model iterations. We use the bobyqa optimizer and
have assessed negligiable changes to log liklihood (< 0.1) and
parameter estimates (< 0.00001). See: - https://cran.r-project.org/web/packages/lme4/vignettes/lmerperf.html
help(convergence)We could use a Generalized Linear Model or a Generalized Linear Mixed Model here too but they often fail to converge. We will model with the log-transformed soil carbon stocks in the linear mixed model
Pairwise comparisons between the top, largest model and the rest show if models with fewer parameters are significantly different. Considering AIC will help determine the most parsimonious model with the best fit to the data.
From the ANOVAs it looks like model 7 has the lower AIC
when compared together.
To hone in on differences between variations in model 7
we can remove some of the variables and compare AICs for a more
parsimonious fit.
Model 7 is the lowest AIC
The model \(R^2\) for
mod7 is 0.822 The model \(R^2\) for mod3.1 is 0.82
The bootstrapped confidence intervals (CI) show the relative size of the effect of each parameter and we can see that WIP, GEO Pleistocene, lower depth, and the site interaction with MAS site are significant predictors
Anova between sites
Need to assess each parameter significance
Looking at some plots we can see some relationships between the
predicted values and WIP, Site and
DTM.
We can now visualize our candidate model 7 in a predicted vs. actual plot
Use the WIP threshold of 50% or 0.5
Trying to do quantile regression or some method that analyzes the highest SOC stock points
LQMM package
From Yu et al,.
Partial correlation measures the strength of relationship between two variables (e.g., SOC concentration and MAT) while removing the effects of one or more possibly confounding variables (e.g., geochemical variables) on this relationship. Differences between zero-order (i.e., no other variables have been controlled) and partial correlations indicate the degree of dependency of target predictors (e.g., MAT) on controlled predictors (e.g., geochemical variables).
So the interaction between two predictor variables.
We can visualize the interactions between site and DTM Elevation with interaction plots against SOC stock.